3n) 



126 lis 



5022 



2DSS PHIC5 
D2SC5IPT0BS 



r2>2S5I?I2ES 



iSSTEiCT 




1QS|005 370 

•s -v / I ■ 

issessing the Fit -of &ata to-tSe *Ea^b iodai. 
[ipr 76] " . 

15p*; Papar preseated at the Innnal Beetiag q£-^ 
Ijtericas Sdv^catibsal Beseaxch Issociation C^th/ San 
Francisco^ Call fore la ^ ,ipril 19-23, 1976) - 

H?--50,33 3C-S1.67 Plus Postage. 

*Goodsess of Fit; Guessing (Tests) ; *2re5 inalysis; 
*fiatheKatical Sodels; Besponse Sryie (Tests) ; ^ 

Staris tical JLaalysis; ^i^t Bias; T est iisesess 

Practice; ^Basch Sodeir^P^^^^^iess - ^ 



This paper considers (1) the regoiremests i«posed on 
data in order to confcri to the Basch r^del, (2) some coMon soarces 
of departure froa the K>del, and (3) a procedure ior recognizing the 
occurrence of these disturbances. Tbe specific disturbances^ discussed 
are guessing/ pracrice, speededness, and bias# The ofcs^rTed 
characteristic curve for each situation is coapared y±tn zhe true 
logistic ogive and vith the ireE characrerisxic curve that the Basch 
estimation procedure «juld fit to such dat^. i conrenient/ 
interpertabie form of residual between ^del and data is suggested • 
(Author) 



♦ DocuJients acgUired by 3BIC include rany inforaail unpublisi^ed 

♦ aaterials not available froa other sources. 2BIC aakes every effort 

♦ to obtain the best copy available. Hevertheless, iteas of aaMinal 

♦ reproducibility are often encountered and this affects the girality 
^ of the aicrofiche and hardcopy repi:oductions E3IC makes available 

♦ Tia the 2BIC Document Beprodcction Service (BDES) . 2DBS is not 

♦ responsible for the gtiality of the original docuaent. Beproductions 

♦ supplie^l by BiJBS are the best that can be aade froa the original. 



ft 



in 



a 



ASSESSING THE FIT 
OF DA^;r 
TO T3S RASCH KOOEL" 



P.onald 14ead 
OniversiHy of Chicago/ 



CO 



'Prep'Bred for 

Aaerican Educatioaai Research Association 
An^joal Meeting 
San Francisco * 

April, 1976 



/ 



ERIC 



2 



Mbec Georg Hasca tbongiit ahout wiiat m-asurement laeant to 
hrm, be arri-^ed at. rfce position that Ib order to obtain some- 
thing he vas wiliicg to call, a ineasureaent , xhe situation had ^ 
•to be dominated "b/i sihgle person parameter and ^ single -item 
paraneter. ThisUead hi^ to the luathenatical expression: 



— ^ ^ €"^"^1 

(1) • ?rob(x^,=I!S,,^': J= 



and the corresponding picture (solid line_ in Fig'ore la) which 
describes any possible person-item mceraction. Conrrar^j' to 
-popular belief, rhis expression does not define a religious 
cult. It can, however, lead to objective meast^^t in ex- 
change for some reasonable if s<^etimes elusive requirements 
on the situation. 
These requiresaents axe: 

(a) For a given item, an abl^berson is alv^ys more likely 
to be right than an unable person.* ""^-^^ 

(b) A given person is always more likely to answe'?^;^y 
item correctly than a difficult one. 

This approach to measurement is unique l?ecause it is not 
the result of trying to describe whatever observations happen 
to loox like but rather of deciding what they need to look like 
to be worthy of the name' "measurements. -■ The extent to whicji 
■ Rasch's m^del can be used to describe 'the real world has to be 
investigated e:i?.i^ically . But it is^easy to think of cases 
which do not fit with the model and to miderstand why they 



shDi3±d not be tailed ireasur ement^ . Here are several vell^kBcira 
exaTTOles • 

A* Eandoin* Guessing (Carelessness) 

The 2K>del does not allow for rax^^ guessing. Khen this 
ha^ppenSg the characteristic curve (Figure la) dpes not approadSi 
to ^ero as the item becoiaes inipossibly difficult. A person 
of lover ability has as good a chance of guessing the correct 
ansven as does a person of higher ability £2- 

The saiBe sort of disturbance could occur m the upper rail 
if very able persons are careless when answering very ea-sy itenis 
puessmg a^idi, c^Lrelessness will lead xxs to underestimate t:he 

» % ' ' ' 

item's difficulty if persons o5 low ability were Lise^ -^^ the 
calibrating sainple and to overestimate its difficulty if persons 
of high ability are u^ed. 



UO ' 



Figure la \ 

Guessing & ' " 

Carelessness I ^ ^ 



0.0 



S-6 



3. SPEED (?H&CTIC5i 



if -people recjuire 'several items to vaunj np before tfeey 
can .operate at trxie ability, tte first few-^^^^l^^n tbe instru- 



sent influenced by lack -of practice. Analogously, if 

people do not finish , the last few it^ns vill be influenced 
by lack of si:>eed. If true ability ijs jmrelated to practice or 
speed effects, then iteins affec^^ by practice or speed vili 
seen more difficult and Less steep than they are. {Fi^^efe^^ ^ 
Their outconies will be influenced by their positions on the 
instruiaent. If ability is low eiSough* there is very little 
chance of .success regardless of the itsn's position. But for 
laore able people, the likelihood of success is a function of 
(S-o) as it shoiiid'be but also the it^'s position, ceisbined with 
how "lauch the person is affected by the position. In the case of 
practice, after the x>erson has answjered a few iteias, be should 
be warmed up and able 'to perform at his true ability. Anal- 
ogousiy, for speeded tests, a person's' JLikelihood of success . 
on the last ites depends on his probability, of success if he 
attempts the item .and n^s probability of attenmting i€. These 
extraneous influences will iess^ the itea*s power to discria- 



inate on the trtie 3;^ility continutua. 
1.0 



Figure 2 a 

Practice and 
Speed 



C. BIAS 

^s re ±s ^iircreacsisg concern for finding itess ti:^t are fair 

toward all s^ibpopiSlations f &x m vhich""ye mght vish to a^siare 

people. One in ^mich biased items inight be e3q>ected to 

operate iS/ if an item is fair for papulation then the item 

* ^ , ' _ ' _ 

cdiaracteristic curve i^ of the usual form* Eovrever/ if the item 
^^^^ « 

involves skills or content or, iliekavioP beyond tfce ability of 
interest/ and these shills are harder for a person in population 
& to acquire than they are for persons in population then the 
itCTi is biased in favor of A. ^e characteristic curve' (Figure 
3a)' for the afcem with people from 3 id.il be to the left and less 
steep thin the cSiafacteristic carve cor peopXe^ froa A. It is 
less steep.' because the probability of succeeding on the iteas 
depends On the probability of having acqjiired the other behaviors 
as well as on -the position on the ability continuum. For a 
•particalar itea this shape is the sane as that ^or practice or 
speed. (In fact, practice aiid speed can be considered special^ 
cases of bias.) y/ 

11 . Coirouta£ion of Residxials 
' .' The way 4jo deterniine if these disturbances are present or 
if it is reasonable to use the xaod^l as our explanaiion for a 
particular situation is to qp35>are the observed outccxae- with the 

. predicted outcoiae. 




Keen the sodel does accovnt for the outcome and p,^^ is >aoKn, 



the mean axid variaixie of x, - are 



(3) 2(3C,^^) = 0.0 



This residual xs th e qixference bet-.^en the observed ICC and 
the predicted ICC in the relative frelaency metric, represented 
by vertical distances on the plots. 

Often ir is usefiil ro transfers to a standard statistic 
by subtracting its expectation and dividing by irs standard 
deviation: 

X .-? . 

\>2. Vi • 

' Vl VI 

which facilitates decisions about the statistical significance 
of the residual. ^ 

tfe could also look at the d?.fference between th^ curves 
in t^e horizontal 'direction. The residual in. this direction is 
■in ^ility units (or logits) . 

Sinbe the derivative of (S^-o^) with 'respect to P^^ can be 
viiwed as the change in scale frosT the frequency metric to th^ 
/ility metric, this can be approxiiaated from X^^-P^^ by 

^> Vi =.^vi/<^vi^^-^i^>. 
fith expectation zero and variance one over P times (l-P) y 
[6) ECy^.) = 0.0 

Var(y .) = 1.0/(P^i(l-Pvi) ) = l-O/^^i 



I" ^ 


• 

• • 


• 

- 

• 


je 

% 


* 

-6- 


• : 

1 




KsLny of the disturbances discussed above have siinple relation- 
ships to y ,^vhxch can be conveniently ana^lyzed through veighted 
/least squares using v.^^ = P.^^{l-Py^) as the veight. 




Ill* Patterns of Residiaals • 






A* True Difficulties and Abilities are Known 




Assxnaing the true paralaeters are known 


9 the differences 




between the laodel and observed characteristic cxirves can be 




easily represented in tenas of t^xe residuals. According to the 




model f the residuals plorted against (Sy^^j^ 


) should fall along 




a horizontal line through the origin, Distxirbances will appear 




-as departures frqm this horizontal liQe* 






For guessing^ the residuals {Figure lb) follow the hori- 




zontal line until the guessing' becomes iapo 


rtant. Then the 




•residuals are positive since the person is doing better than 




expected and in that region have a negative 


trend. For care- 




lessness> the residuals are negative when -is large cCnd 




the slope is again^ negative. 


• 




^ guessing ^ 
Figure lb . ^ 
* 






Residuals for 






Guess i ng/Carelessne ss 






• 






\ , 
> % 


carelessness 

m 

0 


t 

ERIC 


« 

\ 

8 

• 

» • 


« 



If either practice or speed is involved, the iteins which 
are affepted display negative residuals (Figure 2b) with a 
negative trend line over the entire range of ability • This' 
xmast be true since the probability of success 'if* practice or 
speed i§ important is less 'thin it Should , be and the curves 
become further apart as ability increases* The residual 
pattern for items biased against a subpopulation is the same 
sh^ev The two s^uatians can be distinguished easily since 
for practice/speed -the departures can be organized by item 



sequence number and fbr bias, by type of person," 



Yvi 



Figure •2b 



Residuals- For One Item 

with Practice/ Speed 
or Bias 



e-6 



B. Parameters are not Known 



The; preceding dis.ciission was greatly s4mplif ie^L^y .the 
assinnption that the paxameters were known. In practice they 
are not and any disturbances of the sort we have been consi-ddr- 



ing affect oiir estimates of the parameters. There:^ore/ the item's 



ERIC 



9 . 



. - i 



-8- 



characteriftic cturve that we use for reference .in the plots win 
not the true ICC but the average of the observed curves, includ- 

the distorted ones.' ' ^ . - 

If random guessi-ng is a problem for some items, (Figure Ic) 
(e observed discriminating powe^ for the .aVei^age ICC wiil seem 
to be lowered bedause of the attempt to fit a simple logistic 
, curve to the guessed upon items. Depending on how the person 
abilities are distributed we would probably observe nasfit over 
'*hhe entire range because of the distortion caused by -guessing. 

" In terms of • residuals , we would again .'observe a pattern 
'•(Figure Id) which might 'be approximated by a second order 
polynomial ±n (b^-d . i . Because the ciirve is concave upward, 
it would' have a -positive quadratic coef f icient. - 

If the problem were carelessness instead, the polynomial, 
would appear concave, downward and so have a negative .qnadratic 
coefficient. A third order polynomial would be required if ^ 
both guessing and carelessness were active. 



Guessing 



Figure la 

Residuals For 
.Guessing/Carelessness 



VI 



r- ERiC 



10 



Th6 disturt)ances in the central region^ where heither' ^ 

guessing ijor carelessness are thought 'to be operating / are due 
* — • « 

*to the misinformation about ability, and difficulty frota the 
actions of guessing and carelessness. The reason significant 
negative residuals q^c^^with negati^^ values of (b^-d^) is 

begause we have overestimated the ability of some persons and 

underestimated the difficulty of some items. The people were 

overestimated because they successfull/ -guessed some items and 

the items were underestimated because some people successfully 

guessed on them. \^en we are operating under the assumption 

that .the people are that able and the items are that easy, we 

are. falsely alarmed by some missed ^items. 

Practice and speed, wHeh present, become an tinwanted part 

of the "variable" that we observe. Practiced people ^score higher 

than unpracticed people of .the same abi^lity. by doing Veil 'on 

early items. Fast people answer «the last items and so score 

hi«gher than slow people of the same ability! These phenomena 

woiild give the itenvs.*af f ected the appearance 'of high discrimination 

which shows up as a po^^itive t^end in the residuals with respect 

to (b^-d^) . But this apparent discrimination. i3 vith Respect 

to the "variable" that includes, practice and sp6ed effects, not 

the variable ' that we set out tp measure. ^ , 

Z ^ • . ' . • ' * 

Biased items could produce apparent high discriminations 

• « r 

in a similar way. Persons in the favored group would tend to 
score high on .the' entire instrument but particularly on the most 



unfair items. These items would appear to have high dist>3:;imina- 
tions but the inflation, in their - discriminations wouljd be due to 
their, power to distinguish between the favored and unfavored 
groups rather than between more able and less able persons, 
{Figures 3c and d) . ' 

If we instead plot the .residuals for persons arfainst b-d, 
we usually find that the r^sidu^l have negative trend^for per- 
sons from thB-mrf^^vored group and pp3itive trends for persons in 
the , favored 'groiJp/ (Figure 3e) This is because for the un- 
favored group / the items are "measuring" two variables and .so 
have relatively 16ss information abo6t the variable that is of 
•interest to us. This means' that even if all items are' unfair, 
the presence of a general,' bias»^hould be indicated by a decrease 
in item discriminating power. . * ' 

It must be kept in mind ^that When dealing with departures 
from the Rascti mode*, - we have" not achieved sample-.fref calibration 
The patterns in the residual -plQ.ts depend very much on who, was* • 
'in the calibration sample. The appeafirance of these, effects ;^ill 
depend on the relative numbers j^rfHir-'yersus unfair items and 
favored ydtsus unfavored. persons. ' " f 

Conclusion . • ' 

. I began this discmssdon. by asking what sorts of thi^s-jffould 
cause datk no't to fit 'the Rasch model. After listing a few com- 
mon disturbances,' I discussed the straightforward approach that 
Prof. Wright and I have been using to discover if any of these 



•11. 

disturbances are present. Khile our approach incorporates the 
use of well knovm least souares rechnicjues^ acuch worX resnaii^ 
tb be done before the properties 'or these techniques are well 

s 

understood in our application. 

♦ 

All of the dis,turbances considered represent sOnie forsa of 
railtidiiaensionality; they woiild violate any^ siodel that assumes 
unidiaensionality. Since the effect of, the disturbances often 
appears as a change in the slope of the .ICC, any' node! which 
includes itexa discrxiaination as a parameteil^plfcid appear " to 
fit such data. Tnus we would be in the unfortunate situation 
of accepting data which violate the model \s assumptions but 

p»s the tests of fit. _ 

, By -fitting such a general model, we would not only have 
lost the desireable meas.urement properties of the Rasch mod^l, 
but we would have ^Iso mislead ourselves about the true natiire 
of the variable that we were seeking/ When we understand a*" 
process well enough to control its multidimensipnality, we have 
no need for any additional parameters. ^ Dntil th^U/ I do not 
think we can ,af ford. them. 



13 



1 






o 

« 

o 









✓ 

/ 

> 


/ 








/. 




' I 


• 






. i 




1 




1 


* 


. f 





o 5: 

Q 

O 

C 



O 



::3 

a 



15 



ii 



